Tone Generation by Maximizing Joint Likelihood of Syllabic HMMs for Mandarin Speech Synthesis
نویسندگان
چکیده
A tone generation method by maximizing the joint likelihood of syllabic HMMs is proposed to improve the Mandarin speech synthesis. F0 sequence is generated by jointly maximizing the likelihood of the state-level F0 model and syllable-level tone model under the constraint of mean F0 of the adjacent units. The optimal weight of the tone component is searched in terms of the parameter generation error and correlation coefficients. Objective and subjective evaluations both prove the positive effects of this method. The generation error is reduced by 26.7%, the correlation coefficient is increased by 6.5%, and the prosody perception is significantly improved.
منابع مشابه
Prosodic Alternative Units in a Mandarin Chinese Speech Synthesizer
The Mandarin Chinese synthesis component of the Dresden Speech Synthesizer DreSS is based on an inventory of syllabic units. The inventory contains all Chinese syllables with the possible tones in up to three phonetic variations for a correct modeling of the cross syllable coarticulation effects. In order to improve the naturalness and fluency of the synthesized speech, the inventory was comple...
متن کاملGeneration of Fundamental Frequency Contours of Mandarin in HMM-based Speech Synthesis using Generation Process Model
The HMM-based speech synthesis system can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. In this approach, short term spectra, fundamental frequency (F0) and duration are generated by multi-stream HMMs separately. However the quality of synthetic speech degrades when feature vectors used in training are noisy. Among all noisy features, pitch tr...
متن کاملQuantitative analysis of F0 contours of emotional speech of Mandarin
For emotional speech synthesis, a quantitative model giving a parametric representation of F0 contours is needed. Purpose: investigate quantitatively F0 characteristics of Mandarin speech in four basic emotions (anger, fear, joy, and sadness) and in neutral reading. Two approaches are compared: surface features analysis from time-normalized F0 contours analysis-by-synthesis of time-intact F0 co...
متن کاملAn HMM-Based Mandarin Chinese Text-To-Speech System
In this paper we present our Hidden Markov Model (HMM)-based, Mandarin Chinese Text-to-Speech (TTS) system. Mandarin Chinese or Putonghua, “the common spoken language”, is a tone language where each of the 400 plus base syllables can have up to 5 different lexical tone patterns. Their segmental and supra-segmental information is first modeled by 3 corresponding HMMs, including: (1) spectral env...
متن کاملOn the Duration of Mandarin Tones
The present study compared the duration of Mandarin tones in three types of speech contexts: isolated monosyllables, formal text-reading passages, and casual conversations. A total of 156 adult speakers was recruited. The speech materials included 44 monosyllables recorded from each of 121 participants, 18 passages read by 2 participants, and 20 conversations conducted by 33 participants. The d...
متن کامل